Initial Look at Data

Dataset

Data Summary

Categorical Variables in Bar Charts

Numeric Variables in Histograms

Time Series Variables

Outliers and Notes

Categorial Variable Notes

A closer look at dna_visittrafficsubtype shows that many of the subtypes are rarely found in this dataset.

Time Series Variables (improved)

After removing the outlier dates (noted above) for ordercreatedate we can better see the general trend.

After removing the NAs from dnatestactivationdayid we can better see the general trend.

Cross-sell Percentage

Daily Trend

Variance appears to tighten up in 2016-2017 and the obvious drop in late 2016 to 2017 will cause problems for most models. Forecasting or predicting could prove difficult if the model isn’t able to account for those two behaviors.

A more detailed view of this daily xsell conversion may help us understand what is influencing this behavior and how that might affect model construction.

I must not understand the regtenure column